Webpage Classification based on Compound of Using HTML Features & URL Features and Features of Sibling Pages
نویسندگان
چکیده
Webpage classification plays an important role in information organization and retrieval. It involves assignment of one webpage to one or more than one predetermined categories. The uncontrolled features of web content implies that more work is required for webpage classification compared with traditional text classification. The interconnected nature of hyper text, however, carries some features which contribute to the process, for example HTML Tags and URL features of a webpage. This study illustrates that using such features along with features of sibling pages, i.e. pages from the same sibling, as well as Bayesian algorithm for combining the results of these features, it would be possible to improve the accuracy of webpage classification based on this algorithm.
منابع مشابه
A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملFeature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملJoint Web-Feature (JFEAT): A Novel Web Page Classification Framework
With the increasing amount of web pages over the internet, it has been a major concern to obtain information on the internet accurately at a reasonable cost with decent performance. A potential solution is through the classification of web pages into meaningful categories. An effective classification of web pages is of benefit to various applications such as web mining and search engines. Unlik...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملBody Mass Index Classification based on Facial Features using Machine Learning Algorithms for utilizing in Telemedicine
Background and Objectives: Due to the impact of controlling BMI on life, BMI classification based on facial features can be used for developing Telemedicine systems and eliminating the limitations of measuring tools, especially for paralyzed people. So that physicians can help people online during the Covid-19 pandemic. Method: In this study, new features and some previous work features were e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Adv. Comp. Techn.
دوره 2 شماره
صفحات -
تاریخ انتشار 2010